Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
This article asks and attempts to answer the following questions:
What is the total number of prokaryotic cells in the 3 main habitats and the cumulative total on the planet?
What is the total amount of carbon contained in prokaryotic cells in the 3 main habitats and the global total?
What is the average turnover rate and productivity of prokaryotes in the 3 main habitats and the global total?
Cell density data from samples ofthe 3 main representative habitats (Aquatic, soil and subsurface environments) were analyzed and total number estimations were generated. Aquatic numbers were estimated from mean sample values of several primary studies. Soil numbers were estimated based on cultivated soil sample numbers.Subsurface numbers were extrapolated from various studies’ sample data, but also estimated from calculating average porosity of terrestrial subsurface environments and average subsurface prokaryotic volume, and also from groundwater sample data. Carbon content was estimated in soil and subsurface environments as half the dry weight of the average prokaryotic cell. In aquatic environments, the upper estimate of carbon per cell is used.
The total number of prokaryotes on Earth is estimated to be 4-6x10^30 cells and the total amount of prokarotic carbon is esitmated to be 350-550 Pg, which is 60-100% of the total carbon in plants. Prokaryotes also cumulativey contain 10 times the amount of phosphorous and nitrogen than plants. Heterotropic prokaryote turnover is fastest in the upper 200m of the ocean and slowest in subsurface environments. The estimated global cellular production rate is 1.7x10^30 cells/yr. This also leads to a large capacity for genetic diversification through mutation.
How can subsurface estimation methods be refined to more accurately characterize the subsurface prokaryotic world?
How can phylogenetic analysis methods be changed to account for the high degree of diversity and mutation rate in prokaryotes?
With the differences in prokaryotic genomes and evolution through mutation, how do we definte a prokaryotic species?
Comment on the emergence of microbial life and the evolution of Earth systems
+ 4.6 billion years ago - Formation of the Solar System from a local accretion disk, creating the Sun and the planets.
+ 4.2 billion years ago - Formation of the oceans, creating the land-sea surface we know today. There is some evidence that plate tectonics started at this time, as well as the first amino acids and RNA.
+ 3.8 billion years ago - Earliest evidence of life in the form of cells.
+ 3.75 billion years ago - A group splits off from the last common ancestor and forms the domain, Archaea.
+ 3.5 billion years ago - Evolution of the domain, Prokaryotes and evolution of photosynthesis. There is fossil evidence for the formation of microbial aggregations and biofilms.
+ 3.0 billion years ago - First evidence for the presence of viruses
+ 2.7 billion years ago - Evolution of cyanobacteria
+ 2.2 billion years ago - The Great Oxygenation Event, brought upon by massed photosynthesis by cyanobacteria.
+ 2.1 billion years ago - Evolution of Eukarya and later (2.0 Gya), evidence of endosymbiosis by Eukarya to form chloroplasts and mitochondria
+ 1.3 billion years ago - Lineages that eventually formed plant, animal and fungal kingdoms split off from the main eukaryote ancestral line
+ 550,000 years ago - First land plants evolved
+ 200,000 years ago - Evolution of mammals
+ Hadean - The early Hadean might have been characterized by global glaciation due to a weak, young Sun. Later, a very hot atmosphere comprised mainly of water vapour and CO2 formed, with a silica crust on the surface
+ Archean - About 3 times hotter than the current Earth, with a liquid water acidic ocean and a mostly CO2 atmosphere
+ Precambrian - The evolution of photosynthesis and the massed expansion of cyanobacteria caused a huge increase in atmospheric oxygen, called the Great Oxygenation Event
+ Proterozoic - Oxygen rich, hot atmosphere
+ Phanerozoic - Exolution of land plants contributes to a further increase in atmospheric oxygen levels. The Earth is also slightly cooler.
Evaluate human impacts on the ecology and biogeochemistry of Earth systems.
What parameters can be used to define particular aspects of the Earth that are at risk from anthropogenic effects?
Have humans created imbalances in the modern nitrogen and phosphorous cycles, and how would this affect the cycles of the future?
How will the engineered Anthropocene affect global biodiversity and the climate?
Rockstrom established a framework of planetary boundaries to distinguish aspects of the Holocene that are at risk, focusing primarily on climate change, rate of biodiversity loss and the nitrogen cycle. Climate, biodiversity and atmospheric data from literature sources are taken and analyzed.
9 planetary boundaries were defined: climate change, rate of biodiversity loss, nitrogen cycle, phosphorous cycle, stratospheric ozone depletion, ocean acidification, global feshwater usage, change in land use, atmospheric aerosol loading and chemical polution. The first of these 3 boundaries have already exceeded the safe limits, with phosphorous cycle and ocean acidification approaching this limit. Recently, there has been evidence for a shift in the Earth’s climate away form the Holocene state. The current rate of biodiversity is unsustainable and if not reduced would lead to irreversible ecosystem erosion. Excess nitrogen and phosphorous inflow to oceans is causing dangerous levels of acidification and anoxia
What is the extent to which the Earth can sustain the effect of humans without compromising its climate, biodiversity and ecosystems?
How can we quantify these planetary boundaries?
How long does it take to cause irreversible damage to Earth’s natural systems and how low will it take following remediation for these systems to recover?
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
The 3 main primary prokaryotic habitats are aquatic, soil and subsurface habitats, each with their own sub-habitats. Aquatic environments encompass a huge range of habitats, varying from freshwater oxic to saltwater anoxic and with great variation in chemical composition. Marine environments contain about 1.181x10^29 prokaryotes. Soil environments are a major reservoir for organic carbon and other elements and the microbes that inhabit these environments contribute significantly to global food webs and contain about 2.556x10^29 prokaryotic cells. Subsurface environments include both terrestrial and marine sediments below a certain depth. Much of the prokaryotic density is extrapolated off estimations but comes to a large number of 3.8x10^30 cells.
The total number of cells in the upper 200m of the ocean is 3.6x10^28, where 2.9x10^27 of these cells are autotrophic cyanobacteria including the genus prochlorococcus. Of the total cell density at this depth range, this represents approximately 8%. This suggests that a large proportion of carbon fixation occurs in the ocean and that oceanic autotrophs are influential to the Earth’s carbon cycle and that these prokaryotes are also a significant reservoir of organic carbon.
Autotroph - Able to use light energy to fix atmospheric and dissolved CO2 into organic carbon for biological use
Heterotroph - Use organic carbon as their source of metabolic energy
Lithotroph - Make use of inorganic substrates to obtain reducing equivalents
The deepest possible habiat that could support prokaryotic life in terrestrial subsurface sedimnts is rougly 4000m. In marine subsurface sediments, the maximum depth is unlikely below that of the Mariana Trench, which is ~10900m deep. This is because at these depths, the temperature is too high to maintain DNA and proteins without denaturing them.
The highest habitat capable of supporting prokaryotic life is the mesosphere, up to 48km in altitude. The primary limiting factor is the lack of organic material and reducing equivalents, making adequate acquisition of nutrients for metabolism very difficult.
The absolute limit of the biosphere is 77km, where the mesophere ends. Microscopic fungi, their spores spores and some prokaryotic cells have been observed at altitudes close to this value.
Annual production is calculated by multiplying population by turnovers per year
Example calculation for number of turnovers in cells per year in marine heterotrophs above 200m:
((3.6*10^28)*365)/16
## [1] 8.2125e+29
This relationship is largely dependent on the ability of light to penetrate the ocean depths. At depths below 200m, light struggles to penetrate, decreasing autotrophic productivity. In terrestrial habitats, there is severely limited light below the surface thus the low primary productivity and carbon assimilation and turnover rate.
Carbon efficiency of marine heterotrophs in the top 200m of the ocean: 20% 5-20fg C/cell 10x10^-30 pg/Cell (3.6x10^28 Cells)x(10x10^-30 pg/cell)= 0.72 Pg of carbon
4x0.72 = 2.88 Pg/yr of carbon and 51 Pg C/yr with 85% of carbon being consumed = 43 Pg C/yr (43 Pg C/yr)/(2.88 Pg/yr) = 14.9, 1 turnover every 24.5 days
The high mutation rate of prokaryotic cells combined with large populations allows prokaryotes to exhibit a high level of genetic diveristy and combined with horizontal gene transfer, leads to exceptional adaptability to changing environmental condtions. This means that in situations of high nutritional and/or environmental selective pressures, small sub-populations will accumulate sufficient mutations and genes to survive and procreate.
With the huge degree of prokaryotic abundance and genetic diversity, the metabolic potential of the collective microbial biosphere is enormous. Also, given the high mutational rate, it is likely that the emergence of any new natural or manmade chemical compounds will quickly become metabolic substrates for certain prokaryotes, creating new ecological niches.
Git Bash
RStudio
GitHub
command: git config –global user.name “Your Name”
command: git config –global user.email “youremail@email.com”
command: cd ~/Documents
command: pwd
command: git clone https://github.com/EDUCE-UBC/MICB425 MICB425_materials
command: git status
command: git fetch
command: git pull
command: mkdir MICB425_portfolio
command: touch ID.txt
command: git init
command: git add.
command: git commit -m “First commit”
command: git remote add origin https://remote_repository_URL
command: git remote -v
command: git push -u origin master
The following assignment is an exercise for the reproduction of this .html document using the RStudio and Rmarkdown tools we’ve shown you in
class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the
internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a
chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as
possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already confused by the whole markdown thing. Maybe you’re so confused you’ve forgotten how to add. Never fear! A R is here:
calculator
1231521+12341556
## [1] 13573077
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table!
I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly
pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that
you may use in the future.
library(knitr)
kable(summary(cars), caption = "I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh?
Here’s ours! Include a fun gif of your choice!
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
Geophysical processes such as the erosion and weathering of rocks, volcanism and plat tectonics helped create and sustain key elemental cycles such as mineral ions and metals, while biogeochemical processes mediated by microorganisms such as the nitrogen, phosphorous and carbon cycles support the aforementioned geophysical processes. Abiotic processes are often very slow and can take thousands of years to complete some cycles, involving low energy changes, while biotic processes can be relatively rapid and involve frequent energy transformations. Abiotic processes supply a continuous stream of elements to the food chain and biotic processes return these elements to the Earth to be recycled.
Due to the production line-esque nature of most biogeochemical cycles, with different groups of organisms being responsible for specific steps in each cycle, as well as the accompanying abiotic geophysical processes, the redox state of the Earth arises as an emergent property. The collective function of all these processes allows for a continuous cycle of reduction and oxidation. If these processes occured independently, this would not be achieved due to a massive degree of energy inefficiency and redundancy.
Depending on the environmental conditions, the laws of thermodynamics can favour electron transfer between different substrates in the forwar or reverse directions. The presence of specific compounds can directly influence the most thermodynamically favourale direciton of electron transfer. For example, in reducing environments containing H2S or CO, oxidation is unlikely to occur. This results in the creation of unique ecological niches ranging from the massive oxic upper 200m of the ocean to the microscopic anoxic soil crumb interior. Microbes overcome thermodynamic barriers by possessing enzymes and catalysts that drastically reduce these energy barriers to reverse electron flow in specific conditions allowing for greater metabolic flexibility.
The metabolic genes involved in the nitrogen cycle are widespread and heavily transferred through HGT. Over time, certain microbial groups have become specialized in their particular role, for example: Rhizobacterium in nitrogen fixation and Planctomycetes in anammox reactions. These specializations evoled to reduce genetic redundancy and energy wastage within ecosystems. The nitrogen cycle has profound effects on climate change. N2O is a powerful greenhouse gas and its release or redution into N2 is directly controlled by a niche group of denitrifying bacteria. Disruptions to this process or to the microbes responsible would lead to a higher rate of global warming and climate change.
Much of the microbial diversity arises due to changes in metabolic diversity due to environmental and nutritional selective pressures. Through environmental sampling and phylogenetic analysis, researchers can identify new protein families based on evolutionary lineages and tracking spontaneous mutations and HGT events. Unique environments provide a wealth of genetic information on describing novel metabolic enzymes and pathways.
R code below is functional but not placed in code chunk due to size of data display
library(tidyverse) read.table(file=“Saanich.OTU.txt”, header=TRUE, row.names=1, sep=“”, na.strings=c(“NAN”, “NA”, “.”)) OTU = read.table(file=“Saanich.OTU.txt”, header=TRUE, row.names=1, sep=“”)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t", na.strings=c("NAN", "NA", "."))
## Depth_m O2_uM PO4_uM SiO2_uM NO3_uM NH4_uM Std_NH4_uM
## SI072_S3_010 10 216.667 0.520 20.672 1.793 0.4080 0.0084
## SI072_S3_020 20 159.672 1.817 32.888 13.736 2.6551 0.0044
## SI072_S3_040 40 141.778 2.244 38.980 22.013 1.3007 0.0087
## SI072_S3_060 60 97.894 2.598 47.335 28.434 0.2652 0.0047
## SI072_S3_075 75 44.978 3.190 60.986 30.604 0.1281 0.0227
## SI072_S3_085 85 25.807 3.745 70.963 28.731 0.1780 0.0070
## SI072_S3_090 90 27.011 3.934 67.897 26.780 0.1307 0.0054
## SI072_S3_097 97 34.436 3.942 67.645 26.248 0.1371 0.0061
## SI072_S3_100 100 38.012 3.672 69.062 26.400 0.1344 0.0092
## SI072_S3_110 110 27.557 3.965 69.577 24.355 0.1046 0.0252
## SI072_S3_120 120 32.354 4.090 64.383 21.302 0.1782 0.0100
## SI072_S3_135 135 20.446 4.342 70.321 15.917 0.1296 0.0166
## SI072_S3_150 150 0.000 4.988 70.780 5.278 2.1754 0.0293
## SI072_S3_165 165 0.000 5.599 62.580 0.000 4.7095 0.2112
## SI072_S3_185 185 0.000 5.974 64.763 0.000 6.4038 0.4384
## SI072_S3_200 200 0.000 6.298 66.200 0.000 7.3582 0.2816
## NO2_uM Std_NO2_uM H2S_uM Std_H2S_uM Cells.ml N2O_nM
## SI072_S3_010 0.1275 0.0088 0.0000 0.0000 NA 0.849
## SI072_S3_020 0.4938 0.0017 0.0000 0.0000 NA 13.199
## SI072_S3_040 0.3775 0.0052 0.0000 0.0000 NA 12.829
## SI072_S3_060 0.0471 0.0245 0.0000 0.0000 NA 12.306
## SI072_S3_075 0.0532 0.0052 0.0000 0.0000 NA 13.896
## SI072_S3_085 0.0966 0.0105 0.0000 0.0000 NA 12.959
## SI072_S3_090 0.0186 0.0192 0.0000 0.0000 NA 15.551
## SI072_S3_097 0.0297 0.0105 0.0000 0.0000 NA 18.682
## SI072_S3_100 0.0817 0.0105 0.0000 0.0000 NA 18.087
## SI072_S3_110 0.0619 0.0105 0.0000 0.0000 NA 15.843
## SI072_S3_120 0.0978 0.0018 0.0000 0.0000 NA 16.304
## SI072_S3_135 0.0706 0.0018 0.0000 0.0000 NA 12.909
## SI072_S3_150 0.1127 0.0018 0.0000 0.0000 NA 11.815
## SI072_S3_165 0.0805 0.0053 3.5027 0.0423 NA 6.310
## SI072_S3_185 0.0211 0.0158 11.6470 0.0827 NA 0.000
## SI072_S3_200 0.0000 0.0000 17.9867 0.0006 NA 0.000
## Std_N2O_nM CH4_nM Std_CH4_nM Temperature_C
## SI072_S3_010 0.114 1030.478 3.070 12.854
## SI072_S3_020 0.000 29.012 0.000 11.005
## SI072_S3_040 1.509 37.146 2.695 9.536
## SI072_S3_060 0.524 36.501 3.521 8.540
## SI072_S3_075 1.417 24.013 0.435 8.480
## SI072_S3_085 0.955 7.376 0.029 8.538
## SI072_S3_090 1.417 4.190 0.159 8.599
## SI072_S3_097 1.628 3.991 0.759 8.647
## SI072_S3_100 1.275 3.231 0.392 8.703
## SI072_S3_110 1.953 3.633 0.127 8.727
## SI072_S3_120 1.085 3.463 0.519 8.796
## SI072_S3_135 2.577 4.815 0.658 8.882
## SI072_S3_150 0.000 8.323 0.000 9.002
## SI072_S3_165 0.732 23.831 2.291 9.041
## SI072_S3_185 0.000 310.068 0.000 9.091
## SI072_S3_200 0.000 774.034 12.745 9.117
## Conductivity_mScm_1 Fluorescence_mgm_3 OxygenSBE_V
## SI072_S3_010 33.534 3.521 4.954
## SI072_S3_020 32.731 0.207 3.654
## SI072_S3_040 32.149 0.157 3.246
## SI072_S3_060 32.090 0.099 2.243
## SI072_S3_075 32.465 0.171 1.031
## SI072_S3_085 32.704 0.167 0.592
## SI072_S3_090 32.812 0.115 0.619
## SI072_S3_097 32.890 0.087 0.790
## SI072_S3_100 32.970 0.109 0.872
## SI072_S3_110 33.053 0.179 0.632
## SI072_S3_120 33.188 0.197 0.742
## SI072_S3_135 33.345 0.108 0.469
## SI072_S3_150 33.526 0.181 0.089
## SI072_S3_165 33.597 0.132 0.069
## SI072_S3_185 33.681 0.199 0.066
## SI072_S3_200 33.727 0.236 0.063
## Salinity_PSU Density_q
## SI072_S3_010 28.121 21.098
## SI072_S3_020 28.763 21.923
## SI072_S3_040 29.352 22.619
## SI072_S3_060 30.115 23.365
## SI072_S3_075 30.551 23.714
## SI072_S3_085 30.745 23.858
## SI072_S3_090 30.803 23.895
## SI072_S3_097 30.838 23.915
## SI072_S3_100 30.872 23.933
## SI072_S3_110 30.932 23.977
## SI072_S3_120 31.007 24.026
## SI072_S3_135 31.088 24.076
## SI072_S3_150 31.164 24.118
## SI072_S3_165 31.197 24.138
## SI072_S3_185 31.231 24.157
## SI072_S3_200 31.248 24.167
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t")
filter(metadata, CH4_nM > 100 & Temperature_C < 10) %>%
select(Depth_m, Temperature_C, CH4_nM)
## Depth_m Temperature_C CH4_nM
## 1 185 9.091 310.068
## 2 200 9.117 774.034
library(dplyr)
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t")
select(metadata, ends_with("nM")) %>%
mutate(N2O_uM = N2O_nM/1000) %>%
mutate(Std_N2O_uM = Std_N2O_nM/1000) %>%
mutate(CH4_uM = CH4_nM/1000) %>%
mutate(Std_CH4_uM = Std_CH4_nM/1000)
## N2O_nM Std_N2O_nM CH4_nM Std_CH4_nM N2O_uM Std_N2O_uM CH4_uM
## 1 0.849 0.114 1030.478 3.070 0.000849 0.000114 1.030478
## 2 13.199 0.000 29.012 0.000 0.013199 0.000000 0.029012
## 3 12.829 1.509 37.146 2.695 0.012829 0.001509 0.037146
## 4 12.306 0.524 36.501 3.521 0.012306 0.000524 0.036501
## 5 13.896 1.417 24.013 0.435 0.013896 0.001417 0.024013
## 6 12.959 0.955 7.376 0.029 0.012959 0.000955 0.007376
## 7 15.551 1.417 4.190 0.159 0.015551 0.001417 0.004190
## 8 18.682 1.628 3.991 0.759 0.018682 0.001628 0.003991
## 9 18.087 1.275 3.231 0.392 0.018087 0.001275 0.003231
## 10 15.843 1.953 3.633 0.127 0.015843 0.001953 0.003633
## 11 16.304 1.085 3.463 0.519 0.016304 0.001085 0.003463
## 12 12.909 2.577 4.815 0.658 0.012909 0.002577 0.004815
## 13 11.815 0.000 8.323 0.000 0.011815 0.000000 0.008323
## 14 6.310 0.732 23.831 2.291 0.006310 0.000732 0.023831
## 15 0.000 0.000 310.068 0.000 0.000000 0.000000 0.310068
## 16 0.000 0.000 774.034 12.745 0.000000 0.000000 0.774034
## Std_CH4_uM
## 1 0.003070
## 2 0.000000
## 3 0.002695
## 4 0.003521
## 5 0.000435
## 6 0.000029
## 7 0.000159
## 8 0.000759
## 9 0.000392
## 10 0.000127
## 11 0.000519
## 12 0.000658
## 13 0.000000
## 14 0.002291
## 15 0.000000
## 16 0.012745
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 2.2.1 v readr 1.1.1
## v tibble 1.4.2 v purrr 0.2.4
## v tidyr 0.8.0 v stringr 1.2.0
## v ggplot2 2.2.1 v forcats 0.2.0
## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
library(phyloseq)
library(ggplot2)
library(dplyr)
library(knitr)
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t")
ggplot(metadata, aes(x=NH4_uM, y=Depth_m)) +
geom_point(color="purple", shape=17)
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t")
select(metadata, Temperature_C, Depth_m) %>%
mutate(Temperature_F = (Temperature_C * (9/5)+32))
## Temperature_C Depth_m Temperature_F
## 1 12.854 10 55.1372
## 2 11.005 20 51.8090
## 3 9.536 40 49.1648
## 4 8.540 60 47.3720
## 5 8.480 75 47.2640
## 6 8.538 85 47.3684
## 7 8.599 90 47.4782
## 8 8.647 97 47.5646
## 9 8.703 100 47.6654
## 10 8.727 110 47.7086
## 11 8.796 120 47.8328
## 12 8.882 135 47.9876
## 13 9.002 150 48.2036
## 14 9.041 165 48.2738
## 15 9.091 185 48.3638
## 16 9.117 200 48.4106
ex2 = select(metadata, Temperature_C, Depth_m) %>%
mutate(Temperature_F = (Temperature_C * (9/5)+32))
ggplot(ex2, aes(x=Temperature_F, y=Depth_m)) +
geom_point()
load(file="physeq.RData")
physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
plot_bar(physeq_percent, fill="Order") +
geom_bar(aes(fill=Order), stat="identity") +
ggtitle("Saanich Inlet Taxonomic Abundance (10-200m)") +
xlab("Depth of Sample")
metadata = read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t")
ex4 = select(metadata, ends_with("uM"), Depth_m)
facet = gather(metadata, key = "Nutrient", value = "uM", ends_with("uM"))
ggplot(facet, aes(x=Depth_m, y=uM))+
geom_line()+
geom_point()+
facet_wrap(~Nutrient, scales="free_y") +
theme(legend.position="none")
Just how dependent is the human race on the global microbiome for our survival and how well can these microbes fare in the absence of humans? In this paper, I will address the statement: “Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” and present arguments as to why I agree with this statement and provide evidence in which to support my case. It is important to delineate the purpose behind this assessment and the implications of the decision to support or refute this statement. Since the birth of humankind, we see the footprints of our species’ ability to significantly impact our environment, as evident with the global disappearance of megafauna following over-hunting by early humans (Schrag, 2012). This impact only grew with the invention of agriculture and the later, the Industrial Revolution. We are currently facing a great challenge to ensure the preservation of life on planet Earth as we know it. To accomplish this monumental task, the questions we need to ask in context to the prompt are: How dependent are humans on microbial engines for biogeochemical cycling and environmental transformations? How will microbes respond to these changes and can they fix the problems we have caused? Can us humans prevent the impending catastrophic changes to our planet with our technology alone, or will we need the aid of microbial systems? In this discussion, I will attempt to answer these questions and contemplate our place in the grand scheme of the global ecosystem.
In the current biosphere, us humans, and indeed all living organisms, depend on the intricate biogeochemical networks and cycling mediated by microbial communities. As stated by Falkowski et al (2008), the geophysical processes and acid-base chemistry that shaped the abiotic world created niches in which microbes quickly filled, evolving to catalyze the redox reactions that molded the elemental cycles that exist today. Without these microbial systems, it is unlikely that life on Earth, including that of humans, will persist. We will focus on the carbon and the nitrogen cycle to illustrate this point.
Carbon is an essential element, making up the structures that form the critical molecules of life, including plasma membrane lipids, the sugars in nucleic acids and the backbone structures of amino acids used to make proteins. The most influential role of microbes in the carbon cycle is their fixation of CO2 via photosynthesis. Net primary production of phytoplankton is approximately 50% of the global number, with the remaining 50% achieved by land plants (Longhurst et al, 1995), indicating that microbes are an integral component of the carbon cycle. This also has major implications on the influx of energy into the food web, as heterotrophic microbes and higher eukaryotes depend on his autotrophic transformation of light energy into chemical energy. Also, many organisms directly rely on our oxygen-rich atmosphere for the catabolism of dietary molecules. Besides the explosion of oxygenic photosynthesis by cyanobacteria at around 2.7 Gyr that created the current atmospheric makeup, we also have microbes to thank for the maintenance of the atmospheric oxygen concentration. The marine autotroph Prochlorococcus is likely to be the most abundant genus in the biosphere and accounts for between 13-48% of the global oxygen production (Johnson et al, 2006). Another 30-60% of the global oxygen is produced by single-celled marine algae. Hence we humans, as organisms dependent on oxidative phosphorylation and therefore, oxygen as a terminal electron acceptor for respiration, are at the mercy of microbial carbon fixation and oxygen production through photosynthesis.
Nitrogen is another essential component of life and is a key element in amino acids and nucleic acid bases. As nitrogen gas is inherently inert, catalytically breaking the triple bond between the two N atoms requires a large activation energy barrier to be overcome (Canfield et al, 2010). The nitrogenase enzyme has evolved for exactly that task and quickly spread via horizontal gene transfer between prokaryotes and archaea. Almost all biotic N-fixation into ammonium (NH4+) is dependent on bacteria and archaea in terrestrial and marine ecosystems. This constitutes to around 55% of the total N-fixation, with the remaining nitrogen being fixed by anthropogenic processes such as the Haber-Bosch process and in legume farming. While some eukaryotes can fix nitrogen (and it is only because of symbiotic nitrogenase-expressing bacteria!), this quantity is negligible compared to the aforementioned amount. Further along in the cycle, various bacteria oxidize ammonium into nitrite (NO2-) and nitrate (NO3-) to be used as terminal electron acceptors in respiration. Plants are heavily dependent on these species for their source of nitrogen. On the other end of the cycle, fixed nitrogen must return to the atmosphere as N2 gas. This is accomplished by denitrification and anammox reactions. Anoxic denitrification is highly conserved and is the bacteria and archaea and can also be found in some protozoa and fungi. By contrast, we humans contribute essentially nothing to the return of nitrogen to the atmosphere! We will return to the N-cycle in the final section of this paper, where human implications will be discussed.
Rough estimates place the grand total of all prokaryotes at 3.8x1030 cells (Whitman, Coleman, Wiebe, 1998). Although there are limitations to the estimation techniques and, with that, a mild degree of error, this number is staggering. The total carbon in prokaryotes can be estimated with more accuracy. At a total of 350-550x1015g of carbon, prokaryotes make up a massive reservoir of organic carbon, between 60-100% of the total plant carbon content (Whitman, Coleman, Wiebe, 1998). Even more impressive is that soil prokaryotes contain cumulatively more nitrogen than all land plants (Whitman, Coleman, Wiebe, 1998). This, once again, asserts the importance of prokaryotes as a nutrient reservoir. In the context of microbial resilience, the sheer microbial abundance in the biosphere is an indicator of stability and persistence. With this huge number of rapidly dividing cells, comes an immense net mutation rate. Whitman, Coleman and Wiebe (1998) speculated that four simultaneous mutations in every gene shared by the marine heterotroph, marine autotroph, soil prokaryote and prokaryotes in domesticated animal populations will occur between every 24 minutes to 170 hours. Thus, with such a high simultaneous mutation rate, genetic diversity within microbial conditions allows for rapid speciation and diversification of metabolic, signaling and stress response pathways. One potent example of this is in antibiotic resistance. Currently, pathogenic bacteria are developing resistance to antibiotics faster than we can discover new, effective ones. In a hospital setting, this can occur within days, indicating the impressive adaptability of microbes through rapid division and spontaneous mutation.
Not only are prokaryotes experts in environmental adaptation, they also share their abilities with other microorganisms. Horizontal gene transfer is responsible for the vast diversification of metabolic functions across all taxonomic levels. Despite these functions being incredibly widespread and divergent, with some even forming highly niche specific “boutique genes”, the “core genes” remain conserved (Falkowski et al, 2008). This can be observed throughout various microbial communities in different environments, where the “core” gene set is always represented by several populations within the community. Unsurprisingly, Falkowski et al (2008) dubbed microbes as the “Guardians of Metabolism”. Overall, regardless of the apocalyptic environmental or climate changes that might occur in the future, selective pressures will drive microbes to evolve and distribute new metabolic functions to survive and possibly thrive in a new Earth.
It is clear that microbes will be able to survive just fine without humans, as they have been doing for the most of Earth’s geological history. To summarize Nisbet and Sleep’s 2001 Nature review in a nutshell, with the creation of new substrates and habitats, ecological niches are formed in which bacteria will inhabit to utilize those substrates. This occurred with the earliest ancestral microorganisms that used sulphate around hydrothermal vents, and later on, with the evolution and explosion of oxygen-producing cyanobacteria, aerobic heterotrophs evolved oxidative phosphorylation to catabolize organic carbon sources. We even see this with anthropogenic-produced molecules. Recently, polychlorinated biphenyl (PCB)-degrading Rhodococcus samples have been cultured and characterized (Leigh et al, 2006). PCBs are a recalcitrant, toxic industrial pollutant that cannot be degraded or removed by conventional means. Another example of microbial resourcefulness is the increase in lignin-degrading bacterial populations downstream of paper production factories. The sheer metabolic resilience and versatility of microorganisms cannot be overstated.
With the two examples just mentioned in the previous section, we begin to see how us humans may become dependent on microbes for, not only their geochemical cycling, but also their potential as agents for bioremediation. Our industries and transportation systems are responsible for a titanic amount of CO2 and N2O influx into the atmosphere. We also disrupt biogeochemical cycles by increasing plastic, aluminium, black carbon and concrete burial in global sediments (Mooney, 2016). There is no denial that the carbon cycle has already been thrown out of balance. Current atmospheric CO2 levels are at a record 387 ppm, approaching the boundary limit for irreversible loss of the polar ice caps and severe climate change (Rockstrom et al, 2009). We must engineer a system capable of trapping or fixing CO2 and given the innate capacity for marine autotrophs to fix carbon, it is not far-fetched that such a system will utilize microbes for this purpose. We are also drastically increasing the influx of nitrogen to terrestrial and marine ecosystems. With ever-rising demand for food crops we have increased our nitrogen fertilizer usage by ~800% since the 1960s (Canfield et al, 2010), yet around 60% of this fertilizer is leached away and rained into waterways and oceans, causing massed eutrophication. Interestingly, microbial-mediated denitrification, particularly in anoxic marine habitats has risen sharply and has, for the most part, balanced the anthropogenic nitrogen influx (Canfield et al, 2010). However the capacity of this system is uncertain. Plausible solutions include optimizing fertilizer usage and engineering endosymbiosis of N-fixing bacteria into crops to decrease the need for fertilizers. Hence, with our current levels of CO2 production and nitrogen usage, humans cannot possible continue into the future without the buffering effect of microbial catalyzed biogeochemical cycling of essential elements. Even more likely, is that we have to learn to manipulate these functions in order to maintain our way of life and the preservation of the Earth (Mooney, 2016).
In this paper, I have discussed the essential microbial component in biogeochemical cycling of life-sustaining elements including the carbon and nitrogen cycles. I identified that microbial communities, with their high propensity for simultaneous mutations and horizontal gene transfer, as well as sheer abundance, confer a great degree of environmental and metabolic resilience. Lastly, I highlighted the significance of microbial-mediated environmental transformations and metabolic functions in the cushioning of humanity’s detrimental effects on the environment and climate. It is for these reasons that I agree with the statement: “Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” In the current state of affairs, we are pushing the planetary boundaries as described by Rockstrom et al (2009) and should we fail to limit our destructive behaviors, it is probable that the Earth will experience severe climate and environmental transformations, along with drastic alterations in existing biogeochemical cycles. The prospect of understanding the global microbial metabolic metagenome and engineering specific functions to suit the planet’s needs is an arduous one, possibly involving Petabytes upon Petabytes of metagenomic sequencing data and analysis, not to mention the application and genetic engineering steps. We are headed into the first ever, anthropogenically-engineered eon and it will be our knowledge and manipulation of global microbial functions that lead us to either to human extinction or a lasting, prosperous worldwide symbiosis.
Canfield, D. E., Glazer, A. N., & Falkowski, P. G. (2010). The Evolution and Future of Earth’s Nitrogen Cycle. Science,330(6001), 192-196. doi:10.1126/science.1186120
Falkowski, P. G., Fenchel, T., & Delong, E. F. (2008). The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science,320(5879), 1034-1039. doi:10.1126/science.1153213
Johnson, Z. I., Zinser, E. R., Coe, A., McNulty, N. P., Woodward, E. M. S., Chisholm, S. W. (2006). Niche Partitioning Among Prochlorococcus Ecotypes Along Ocean-Scale Environmental Gradients. Science,311(5768), 1737-1740. doi:10.1126/science.1118052
Kasting, J. F., & Siefert, J. L. (1984). The evolution of the prebiotic atmosphere. Origins of Life,14(1-4), 75-82. doi:10.1007/bf00933642
Leigh, M. B., Prouzova, P., Mackova, M., Macek, T., Nagle, D. P., & Fletcher, J. S. (2006). Polychlorinated Biphenyl (PCB)-Degrading Bacteria Associated with Trees in a PCB-Contaminated Site. Applied and Environmental Microbiology,72(4), 2331-2342. doi:10.1128/aem.72.4.2331-2342.2006
Longhurst, A., Sathyendranath, S., Platt, T., & Caverhill, C. (1995). An estimate of global primary production in the ocean from satellite radiometer data. Journal of Plankton Research,17(6), 1245-1271. doi:10.1093/plankt/17.6.1245
Mooney, C. (2016, January 07). Scientists say humans have now brought on an entirely new geologic epoch. Retrieved February 16, 2018, from https://www.washingtonpost.com/news/energy-environment/wp/2016/01/07/scientists-say-humans-have-now-brought-on-an-entirely-new-geologic-epoch/
Nisbet, E. G., & Sleep, N. H. (2001). The habitat and nature of early life. Nature,409(6823), 1083-1091. doi:10.1038/35059210
Rockström. J., Steffen. W., Noone. K., Persson. Å., Chapin. F. S., Lambin. E. F., Lenton. T. M., Scheffer. M., Folke. C., Schellnhuber H. J.. (2009). A safe operating space for humanity. Nature,461, 472-475
Whitman, W. B., Coleman, D. C., & Wiebe, W. J. (1998). Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences,95(12), 6578-6583. doi:10.1073/pnas.95.12.6578
Achenbach, J. (2012, January 2). Spaceship Earth: A new view of environmentalism. The Washington Post. Retrieved February 15, 2018, from: https://www.washingtonpost.com/national/health-science/spaceship-earth-a-new-view-of-environmentalism/2011/12/29/gIQAZhH6WP_story.html?noredirect=on&utm_term=.46b9533858cf
Canfield, D. E., Glazer, A. N., & Falkowski, P. G. (2010). The Evolution and Future of Earth’s Nitrogen Cycle. Science,330(6001), 192-196. doi:10.1126/science.1186120
Falkowski, P. G., Fenchel, T., & Delong, E. F. (2008). The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science,320(5879), 1034-1039. doi:10.1126/science.1153213
Falkowski, P. G. (2000). The Global Carbon Cycle: A Test of Our Knowledge of Earth as a System. Science, 290(5490), 291-296. doi:10.1126/science.290.5490.291
Johnson, Z. I., Zinser, E. R., Coe, A., McNulty, N. P., Woodward, E. M. S., Chisholm, S. W. (2006). Niche Partitioning Among Prochlorococcus Ecotypes Along Ocean-Scale Environmental Gradients. Science,311(5768), 1737-1740. doi:10.1126/science.1118052
Kallmeyer, J., Pockalny, R., Adhikari, R. R., Smith, D. C., & Dhondt, S. (2012). Global distribution of microbial abundance and biomass in subseafloor sediment. Proceedings of the National Academy of Sciences, 109(40), 16213-16216. doi:10.1073/pnas.1203849109
Kasting, J. F., & Siefert, J. L. (1984). The evolution of the prebiotic atmosphere. Origins of Life,14(1-4), 75-82. doi:10.1007/bf00933642
Leigh, M. B., Prouzova, P., Mackova, M., Macek, T., Nagle, D. P., & Fletcher, J. S. (2006). Polychlorinated Biphenyl (PCB)-Degrading Bacteria Associated with Trees in a PCB-Contaminated Site. Applied and Environmental Microbiology,72(4), 2331-2342. doi:10.1128/aem.72.4.2331-2342.2006
Leopold, A. C. (1997). The Land Ethic by Aldo Leopold. A Sand County Almanac, 193-198. doi:10.1007/978-1-4615-6003-6_19
Longhurst, A., Sathyendranath, S., Platt, T., & Caverhill, C. (1995). An estimate of global primary production in the ocean from satellite radiometer data. Journal of Plankton Research,17(6), 1245-1271. doi:10.1093/plankt/17.6.1245
Mooney, C. (2016, January 07). Scientists say humans have now brought on an entirely new geologic epoch. Retrieved February 16, 2018, from https://www.washingtonpost.com/news/energy-environment/wp/2016/01/07/scientists-say-humans-have-now-brought-on-an-entirely-new-geologic-epoch/
Nisbet, E. G., & Sleep, N. H. (2001). The habitat and nature of early life. Nature,409(6823), 1083-1091. doi:10.1038/35059210
Rockström. J., Steffen. W., Noone. K., Persson. Å., Chapin. F. S., Lambin. E. F., Lenton. T. M., Scheffer. M., Folke. C., Schellnhuber H. J.. (2009). A safe operating space for humanity. Nature,461, 472-475
Waters, C. N., Zalasiewicz, J., Summerhayes, C., Barnosky, A. D., Poirier, C., Galuszka, A., . . . Wolfe, A. P. (2016, January 08). The Anthropocene is functionally and stratigraphically distinct from the Holocene. Retrieved from http://science.sciencemag.org/content/351/6269/aad2622
Whitman, W. B., Coleman, D. C., & Wiebe, W. J. (1998). Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences,95(12), 6578-6583. doi:10.1073/pnas.95.12.6578
. Discuss the relationship between microbial community structure and metabolic diversity
. Evaluate common methods for studying the diversity of microbial communities
. Recognize basic design elements in metagenomic workflows
What are the main structural and functional features of photorhodopsin expressed in E. coli?
How can HGT effect the acquisition of photorhodopsin expression genes and metabolic capabilities?
What can be concluded about the functional diversity and ubiquity of photorhodopsin?
A marine picoplankton large-insert Fosmid genomic library was surveyed for in vivo expression of PR photorhodopsin systems and following phenotypic analysis, genetic and biochemical analyses were ued to confirm the gene functions.
Two clones were found to possess in vivo PR photosystem gene clusters on fosmids. These photosystems have loght-dependent proton translocation activity and could strongly induce retinal biosynthesis in E. coli, driven by photophosphorylation. These genes are also capable of being transferred via HGT.
How are PR photosystem expression induced? Are they expressed during times of nutritional starvation and low ATP production?
How often do HGT events of PR photosystems occur in nature and just how widespread is the distribution and variation of these genes?
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)
2016: 89 bact phyla, 20 archael phyla, determined via small 16S rRNA databases, but there could be up to 1500 bact phyla (microbes in the “shadow biosphere”)
thousands,
Databases - MG-RAST, NCBI-refseq, ING-/M
Assembly - EULER
Binning - S-GSOM
Annotation - KEGG
Analysis pipelines - MEGAN 5
Standalone Software - OTUbase
Analysis pipelines - SILVA
Denoising - AmpliconNoise
Databases - Ribosomal Database Project (RDP)
What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?
What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
the process of grouping sequences that come from a single genome
types of algorithms: align sequences to database, group to each other base don DNA characteristics (GC content, codon usage)
risks: incomplete coverage of genome sequence (partial data), contamination from different phylogeny
funcitonal screens (biochemical tests), 3rd generation sequencing (nanopore) good hybrid between single cell sequencing and shotgun sequencing, single cell sequencing, FISH probe, possible to combine methods?
Madsen, E. L. (2005). Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology, 3(5), 439-446. doi:10.1038/nrmicro1151
Martinez, A., Bradley, A. S., Waldbauer, J. R., Summons, R. E., & Delong, E. F. (2007). Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proceedings of the National Academy of Sciences, 104(13), 5590-5595. doi:10.1073/pnas.0611470104
Mewis, K., Taupp, M., & Hallam, S. J. (2011). A High Throughput Screen for Biomining Cellulase Activity from Metagenomic Libraries. Journal of Visualized Experiments, (48). doi:10.3791/2461
Wooley, J. C., Godzik, A., & Friedberg, I. (2010). A Primer on Metagenomics. PLoS Computational Biology, 6(2). doi:10.1371/journal.pcbi.1000667
. Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
. Explain the relationship between microdiversity, genomic diversity and metabolic potential
. Comment on the forces mediating divergence and cohesion in natural microbial communities
Between 3 different strains of E. coli (2 uropathogenic, 1 non-pathogenic), what degree of genetic similarity/OTU similarity can be observed?
What degree of conservation of pathogenicity associated genes/PAIs exists between the strains?
How lateral gene transfer contributes to the emergence of new uropathogenic E. coli strains?
Clones and sequencing methods:
Used pBluescript and M13Janus to prepare whole-genome libraries with genomic DNA. Data was collected from Applied Biosystem ABI377 and 3700n automated seqeuncers.SEQMANII was used to assemble sequence data and primer walking and PCR-based analysis used to finish sequencing opposite ends of linking clones. A XhoI optical map of whole genome was used to order contigs and confirm contig structure, and acts as map of whole genomes.
Sequence analysis and annotation methods:
The genome sequence was annotated in MAGPIE and defined the ORFs in GLIMMER. Protein BLAST was used to search for predicted proteins and shotgun sequencing and PCR were performed. A 90% identity match of CFT073 in either MG1655 or EDL933, 90% of both genes in alignments, and lack of equivalent match in other location in CFT073 genome used to infer orthology.
The CFT073 genome is a circular, 5 million-bp chromosomal sequence with seven times coverage. Selective pressure was found to result in increased conservation of PAIs in infection and host colonization. The backbone sequence is evolutionarily conserved through vertical gene transfer and remains relatively constant. Specific regions for insertion of infectious genes are conserved, however only the locations are conserved, but not the specificity of the genes. There is great genetic diversity between PAIs of pathogenic versus pathogenic strains and also of pathogenic versus non-pathogenic strains. Each strain of E. coli possesses combinations of island genes that confers its characteristic lifestyle and associated level of pathogenicity. The uropathogenic strain CFT073, for example, acquired an island gene that allows it to infect the urinary tract and bloodstream without compromising its ability to colonize the intestine. The net result is the formation of a mosaic genome structure where newly acquired genes are inserted into the backbone framework sequence. This allows for E. coli to be distinguished from its close relatives such as Salmonella enterica.
How similar/different are pathogenic E. coli strains from other pathogens such as Shigella and Y. pestis?
How can species be defined to account for frequent gain and loss of accessory genes as we cannot simply define species by phenotypic analysis and low resolution mapping alone.
Why hasn’t there been any evidence of extensive genome reductions despite the fact that E.coli has been living as a commensal in the nutrient-rich intestinal system of animals for millenia?
. Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
. Identify common molecular signatures used to infer genomic identity and cohesion
. Differentiate between mobile elements and different modes of gene transfer
An ecotype in the context of the human body is defined as subtle genetic and phenotypic differences between the populations that inhabit a specific tissue environment The great variation between different tissue types in the human body results in many ecotypes. The differences are characterized by the presence of different receptors and secreted factors that comprise the extracellilar matrix of each tissue. Thus, with different tissues, comes a range of different factors that affect the ability of microbes to colonize them. Each specific tissue environment creates a niche that allows only some organisms to survive there. Parallel to this, pathogenic microbes possess a set of genes that allow them to infect only a specific type of tissue, as seen with the strains mentiones in the paper. These adaptive traits result in increased fitness, thus encouraging their persistence through vertical descent, however it is likely that new, non-pathogenic strains could acquire these genes through HGT and become able to infect the tissue type. The figure shows that between pathogenic strains, many pathogenicity-associated genes are located in the same loci while maintaining a common backbone sequence. It is likely that these genes in the same locus were derived from a common ancestral gene.
Gain experience estimating diversity within a hypothetical microbial community
Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.
Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.
Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.
For example, load in the packages you will use.
#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)
Then load in the data. You should use a similar format to record your community data.
example_data1 = data.frame(
number = c(1,2,3),
name = c("lion", "tiger", "bear"),
characteristics = c("brown cat", "striped cat", "not a cat"),
occurences = c(2, 4, 1)
)
Finally, use these data to create a table.
example_data1 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | lion | brown cat | 2 |
| 2 | tiger | striped cat | 4 |
| 3 | bear | not a cat | 1 |
For your sample:
Community_total1 = data.frame(
number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
name = c("kiss", "red short string", "red long string", "white spiral", "mike ike", "watermelon", "gummy bear", "marshmallow bear", "lego square", "lego rectangle", "coke", "spider", "big round candy", "skittle", "m&m", "fish", "elongated noodle", "elongated cloud", "elongated oval", "elongated diamond"),
characteristics = c("teardrop shape", "short string", "long string", "large ellipsoid flat", "rounded rod", "rectangular", "clear bear shape", "opaque bear shape", "large square brick", "small rectangular brick", "coke bottle shape", "spider shape", "sphere", "ellipsoid round", "small elipsoid flat", "fish shape", "short noodle shape", "cloud shape", "oval", "diamond"),
occurences = c(16,6,10,3,172,1,99,3,3,15,3,6,24,187,243,1,4,3,1,1)
)
Community_total1 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | kiss | teardrop shape | 16 |
| 2 | red short string | short string | 6 |
| 3 | red long string | long string | 10 |
| 4 | white spiral | large ellipsoid flat | 3 |
| 5 | mike ike | rounded rod | 172 |
| 6 | watermelon | rectangular | 1 |
| 7 | gummy bear | clear bear shape | 99 |
| 8 | marshmallow bear | opaque bear shape | 3 |
| 9 | lego square | large square brick | 3 |
| 10 | lego rectangle | small rectangular brick | 15 |
| 11 | coke | coke bottle shape | 3 |
| 12 | spider | spider shape | 6 |
| 13 | big round candy | sphere | 24 |
| 14 | skittle | ellipsoid round | 187 |
| 15 | m&m | small elipsoid flat | 243 |
| 16 | fish | fish shape | 1 |
| 17 | elongated noodle | short noodle shape | 4 |
| 18 | elongated cloud | cloud shape | 3 |
| 19 | elongated oval | oval | 1 |
| 20 | elongated diamond | diamond | 1 |
#Many species were missed in the sameple and thus does not represent the true diversity of the community. This is due to the fact that many species were low in abundance and not picked up in the sample.
Part 2: Collector’s curve To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.
To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.
For example, we load in these data.
example_data2 = data.frame(
x = c(1,2,3,4,5,6,7,8,9,10),
y = c(1,2,3,4,4,5,5,5,6,6)
)
And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.
ggplot(example_data2, aes(x=x, y=y)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'
For your sample:
Community_sample1 = data.frame(
x = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20),
y = c(0,2,6,7,41,41,57,57,57,62,62,64,70,110,164,164,164,164,165,165)
)
ggplot(Community_sample1, aes(x=x, y=y)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'
#The curve begins to flatten at 8 species and once again at 18 species, the total number of individuals collected is 165.
#Species diversity increases in the sample at the beginning, middle and end of the curve. However, in the intermediary regions, the steepness indicates fewer new species counted.
Using the table from Part 1, calculate species diversity using the following indices or metrics.
\(\frac{1}{D}\) where \(D = \sum p_i^2\)
\(p_i\) = the fractional abundance of the \(i^{th}\) species
For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =
species1 = 2/(2+4+1) species2 = 4/(2+4+1) species3 = 1/(2+4+1)
1 / (species1^2 + species2^2 + species3^2)
Kiss = 5/147 Red_candy = 1/147 Red_long_candy = 1/147 Mike_ike = 34/147 Gummy_bear = 18/147 Lego_square = 1/147 Lego = 2/147 Spider = 1/147 Big_round = 3/147 Skittle = 31/147 MM = 48/147 Fish = 1/147 cloud = 1/147
Simpsons = 1/ (Kiss^2 + Red_candy^2 + Red_long_candy^2 + Mike_ike^2 + Gummy_bear^2 + Lego_square^2 + Lego^2 + Spider^2 + Big_round^2 + Skittle^2 + MM^2 + Fish^2 + cloud^2)
The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.
species1 = 16/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species2 = 6/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species3 = 10/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species4 = 3/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species5 = 172/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species6 = 1/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species7 = 99/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species8 = 3/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species9 = 3/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species10 = 15/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species11 = 3/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species12 = 6/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species13 = 24/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species14 = 187/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species15 = 243/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species16 = 1/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species17 = 4/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species18 = 3/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species19 = 1/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
species20 = 1/(16+6+10+3+172+1+99+3+3+15+3+6+24+187+243+1+4+3+1+1)
1 / (species1^2 + species2^2 + species3^2 + species4^2 + species5^2 + species6^2 + species7^2 + species8^2 + species9^2 + species10^2 + species11^2 + species12^2 + species13^2 + species14^2 + species15^2 + species16^2 + species17^2 + species18^2 + species19^2 + species20^2)
## [1] 4.763291
species1 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species2 = 2/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species3 = 4/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species4 = 1/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species5 = 34/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species6 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species7 = 16/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species8 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species9 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species10 = 5/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species11 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species12 = 2/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species13 = 6/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species14 = 30/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species15 = 54/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species16 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species17 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species18 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species19 = 1/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
species20 = 0/(0+2+4+1+34+0+16+0+0+5+0+2+6+40+54+0+0+0+1+0)
1 / (species1^2 + species2^2 + species3^2 + species4^2 + species5^2 + species6^2 + species7^2 + species8^2 + species9^2 + species10^2 + species11^2 + species12^2 + species13^2 + species14^2 + species15^2 + species16^2 + species17^2 + species18^2 + species19^2 + species20^2)
## [1] 5.122295
Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.
\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)
\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more
So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =
3 + 1^2/(2*2)
Chao1 = (13 + ((6^2)/(27))) What is the chao1 estimate for your sample? * What is the chao1 estimate for your original total community?
20 + 4^2/(16*2)
## [1] 20.5
11 + 2^2/(9*2)
## [1] 11.22222
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
Community_total1_diversity =
Community_total1 %>%
select(name, occurences) %>%
spread(name, occurences)
Community_total1_diversity
## big round candy coke elongated cloud elongated diamond elongated noodle
## 1 24 3 3 1 4
## elongated oval fish gummy bear kiss lego rectangle lego square m&m
## 1 1 1 99 16 15 3 243
## marshmallow bear mike ike red long string red short string skittle
## 1 3 172 10 6 187
## spider watermelon white spiral
## 1 6 1 3
diversity(Community_total1_diversity, index="invsimpson")
## [1] 4.763291
specpool(Community_total1_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 20 20 0 20 0 20 20 0 1
If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.
As famously stated by James Staley in the 3rd 1997 issue of Current Opinion in Biotechnology, if the current microbial species definition of 97% DNA sequence identity was to be used in animal classification [1], the entire order primates would be considered a single species. Obviously this is not and cannot be the case. Microbial species have always been defined differently from macroorganisms due to a variety of challenges associated with delineating new microbial species, as opposed to the methods used with macroorganisms. Although scientists have long recognized this disparity, the question remains: How do we define microbial species? This paper will explore some of the difficulties faced by geneticists and bacteriologists as they attempt to define and characterize a new microbial species and the complications caused by horizontal gene transfer (HGT), focusing on the evolution and phylogenetic distribution of metabolic pathways through time. With HGT being such a prominent mechanism of microbial evolution, this paper will also discuss the influential effects of HGT on shaping biogeochemical cycles over the course of the Earth’s history. Finally, the argument will be made that, although our definition of microbial species is questionable at best, it may not be necessary for us to have a clear definition for the purpose of our study of the microbial world.
A species, as described by the zoological species definition, are: “groups of interbreeding or potentially interbreeding natural populations that are reproductively isolated from other such groups” [2]. Already, this presents a problem with delineating microbial species as bacteria and archaea do not “breed” via conventional sexual means as with most macroorganisms, rather they reproduce through binary fission. The currently accepted microbial species definition is a percentage equal to or higher than 70% of DNA similarity, which corresponds to 97% sequence identity [3]. These values are determined through phenotypic analyses such as fatty acid synthesis and growth conditions, and genetic analyses such as DNA-DNA hybridization, DNA profiling and sequencing and GC content ratios [3]. This is known as the polyphasic species definition [4]. However, within these methods lies another issue. Phenotypic characterization of species requires the organisms to be culturable in the laboratory and this is often not the case, as many microbial species only grown in very specific environmental conditions, making it impossible to perform phenotypic analyses. Also, although the listed genetic methods acknowledge the frequency of HGT and gene rearrangement events between microbes, they fail to account for the variation in rate and mode of these genetic exchanges that drastically differ between different microbial evolutionary lines [3]. Collectively, these issues with defining microbial species are known as the “species concept” [5]. With the rise of 16S phylogeny, multi-gene and whole genome sequencing, phylogenetic analyses appear to be a promising new method in which to define microbial species. With the power of 16S rRNA analyses to define organism ancestry and evolutionary paths, could they be used to define microbial species to a similar level of precision? The short answer is, unfortunately, no. Recent rapid evolution in prokaryotic lineages can be observed in the primary DNA sequences and through phenotypic changes, however these changes are not strongly reflected in 16S rRNA, which is relatively stable and rarely subject to alterations [3]. Additionally, some species that have almost identical 16S rRNA sequence identity have far lower than 70% DNA-DNA hybridization [6]. Other species cannot be characterized due to their distinctive phylogenetic position and uniqueness from known microorganisms [3]. Thus, 16S rRNA analysis is a powerful method in which to determine microbial ancestry, but may not be totally adequate to define microbial species and is unlikely to replace the current DNA-DNA hybridization method. It does, however, have a usefulness in determining the level of DNA reassociation analysis needed, which is often slow and complex [3].
Given its frequency and prevalence in prokaryotes and archaea, HGT represents one of the biggest challenges in delineating species within the two domains. With antibiotic resistance, it was found that between various E. coli strains, 8-21% of genes are unique to specific strains and are often encoded on genomic islands consisting of multiple genes that confer resistance[7],[8],[9]. Therefore, although there is significant degree of genetic variation between strains, they are still considered the same “species”. The essential pathogenicity island cagPAI on H. pylori was shown to have been introduced only once in evolutionary history and has been retained in the genome since, while large variation in H. pylori pathogenicity-associated genes suggest rapid gain and loss of these lesser functions [10],[11]. Just as in antibiotic resistance, HGT plays an integral role in the context of metabolic pathway evolution. Some HGT events, although rare, can lead to the transfer of entire gene sets that confer adaptation to new environment and the development of novel metabolic mechanisms [12],[13]. Another example is the transfer of genes encoding polychlorinated biphenyl (PCB) degrading enzymes through conjugation between Acidovorax spp. strains, enabling recipients to catabolically breakdown PCBs and using the compounds as part of their metabolism [14]. If a strictly aerobic prokaryote can acquire a whole gene cluster that suddenly enables it to live in an anoxic environment overnight, does this qualify it to become a new species? Due to the evolutionary timescales and macroscopic perspectives in which we use to define species it becomes difficult to provide a clear answer. Yet, it is evident that the majority of variable genes acquired through HGT are lost over time, especially if they were obtained during periods of high selective pressure and are associated with an elevated fitness cost [15]. Therefore, a large proportion of the genetic diversity gained by is transient even on the small timescale. For this reason, Achtman and Wagner (2008) conclude that genetic variation due to HGT is not sufficient grounds in which to delineate microbial species [15].
Despite the transient nature of the majority of horizontally transferred genes, there have been some functions that have been retained over the course of eons. These functions helped shape the biogeochemical cycles of the past and the present. Through HGT, novel changes in metabolic activity can occur, influencing ecological changes [16]. Historically, HGT between ancestral prokaryotic and archaic species resulted in the diversification of metabolic enzymes and was thus influential in the creation of specific ecological niches. This appears to have been encouraged by heavy nutritional selective pressures [16]. The presence of dissimilatory sulfate reductases in sulfate reduce Gram positive ??-proteobacteria and archaea show evidence for lateral propagation of sulfate respiration [17], likely resulting in the establishment of sulfur cycling in anoxic environments such as the deep ocean and hydrothermal vents. In the Hadean and Archaean eons, communal evolution through HGT was the primary mechanism in the development of early metabolic pathways [17]. This global gene pool of mobile catabolic and anabolic genes represented the main mode of evolution [17]. Later in the evolutionary timeline, we see the transfer of various nitrogenases from archaea to photosynthetic cyanobacteria along with the widespread transfer and evolution of ammonia monooxygenase homologs, all of which are essential enzymes that lead to the formulation of the modern nitrogen cycle [17]. Even the acquisition of the complex gene clusters required to form nitrogen-fixing root nodules in legumes was a direct result of rare HGT events [18]. The impact of HGT can also be seen in metal biogeochemical cycles, where high frequency HGT of PIB-type ATPases between prokaryotes in deep terrestrial subsurface sediments [19]. These ATPases are essential transporters; responsible for moving soft metal ions and essential micronutrients such as Cu2+, Zn2+ and Co2+, as well as the efflux of toxic ions such as Pb2+ and Ag+ from the cytoplasm. Microbial metal homeostasis plays a crucial role in metal ion cycling in the biosphere and the HGT of these functions appears to have occurred early in evolutionary history, even before the diversification of the genus Pseudomonas [19]. Despite our best efforts, the vast majority of microbial genetic diversity remains uncharacterized. This is, in part due to the presence of unique, isolated habitats such as the deep ocean and terrestrial sediments. The microorganisms that call these environments their home are known as the rare biosphere [20]. Because sampling from these inaccessible and often dangerous habitats is very difficult, we understand little about these exclusive environments that could potentially hold a great wealth of genetic and evolutionary information. Limited studies on these deep sea microbial sulfur respiration suggest that due to them having persisted for eons, they likely once had monumental influences in the biogeochemical cycling of the past and may still continue to have an effect on modern cycles [20]. These organisms have been shown to use hydrogen and a reverse tricarboxylic acid (TCA) cycle to fix carbon and may have contributed to the establishment of the early carbon cycle. Evidence of HGT in low-insert libraries from samples support the model of HGT being the primary mode of evolution for early microbes, especially in hostile environments. We return to gene loss and transient horizontally transferred genes. As microorganisms become more specialized and efficient in performing specific metabolic tasks, they begin to lose genes responsible for other steps in the metabolic pathway. This creates new niches in which other microbes are more than happy to fill, creating a collective metabolism shared within entire communities, but also limiting ecological overlap between different organisms [21].
Throughout human history, we have attempted to differentiate and classify the other living organisms around us and to an extent, we have been largely successful. Our models, methods and conventions used to describe and delineate both microorganisms and macroorganisms have allowed us to study evolutionary history to a fairy accurate degree. Now, with the advent of phylogenetics and 16S rRNA profiling we are able to able to characterize new species to an even greater amount of precision. However, an absolute definition for microbial species has not been determined. Yet, for all the purposes of scientific knowledge and innovation, we may not need a clear definition after all. In the medical world, immunologists are already capable of determining the potential pathogenicity of bacterial and viral infectious agents by detecting minute differences in their genomes. There are already systems in place to describe different populations within a bacterial “species” such as different strains and serovars. The ability of prokaryotes to gain new metabolic and antibiotic resistant functions through HGT is well known and characterized, however whether this influences the definition of a bacterial “species” is a matter of description and less of function. Suppose that a particular strain of nitrogen-fixing Rhizobium leguminosarum soil bacteria gain the ability to degrade viscous hydrocarbons through HGT. Whether this organism will be described as a new strain or delineated into an entirely new species is largely irrelevant of its functions, as regardless of its taxonomy, it is likely to be used in bioremediation of oil spills.
We acknowledge that HGT, among other challenges, can make the characterization and delineation of microbial species extremely difficult, however the ability of HGT to influence our definition of microbial species is highly context dependent. While many variable genes acquired via HGT are transient and are likely to be lost over time, as in the case of transient antibiotic resistance, other horizontally transferred genes have been retained in prokaryotic and archaic genomes and became integral components of both ancient and modern biogeochemical cycles. In the case of these long-term genetic changes, the effect of HGT on the microbial species definition is justified. Ultimately, as mentioned in the previous paragraph, the purpose of characterizing new microbial species must also be accounted for. Thus, when attempting to describe a novel microbial species, through the phenotypic, genetic and phylogenetic methods used today, we should recognize that this definition is flexible and dependent on the an evolutionary timescale, especially in the context of HGT. Unless a revolutionary method of analyzing bacterial genetic material is developed, it is unlikely that there will be much divergence from the current microbial species definition.
Saanich Inlet, a saltwater fjord, offers a unique opportunity to study changes in microbial communities throughout the water column due to the pronounced redox gradient resulting from the restricted water flow and influx of nutrients. Sampling at multiple depths throughout the inlet, biomass is collected and the V4-V5 region of the 16s rRNA is sequenced and processed using both mothur and QIIME2. Overall community structure and structure of the family Oceanospirillaceae are then studied with changing depth and oxygen concentration. Analysis using mothur-processed data and QIIME2-processed data reveal that the abundance of Oceanospirillaceae changes significantly with depth and oxygen concentration. More than half of the assigned OTUs for Oceanospirillaceae were significant, compared to only four significant ASVs. Answers to the assigned questions generally displayed overlapping similarities using both pipelines with minor differences in the taxa assignments and calculation of statistical values (Chao1 and Shannon Diversity Index) across the depths.
Saanich Inlet is a saltwater fjord located off the coast of Vancouver Island. It is characterized by restricted inflow of water due to the shallow sill at the boundary with the Salish Sea, as well as the seasonal patterns of oxygen depletion, and the associated fluctuations in dissolved NO3- and H2S with depth (1, 2). Nutrient input in the Saanich Inlet is largely dependent on the tidal cycles, which allow for water mixing, leading to subsequent increases in primary productivity. With limited movement of water outside the enclosed basin, organic matter generated in the euphotic zones of the inlet is quickly aerobically respired as it sinks to the bottom. This leads to the ultimate depletion of dissolved oxygen and establishment of Oxygen Minimum Zones (OMZs) prevalent in deepwater regions throughout most of the year, with the exception of late summer and fall seasons when the fjord becomes slightly oxic (3, 4).
Due to the differences in dissolved O2 concentrations and the high redox potential of O2, the fjord experiences a whole variety of microbial metabolisms from depths of 0-250 m, with populations preferentially utilizing the strongest electron acceptor readily available at any given depth (3). This phenomenon leads to a range of oxic to dysoxic to anoxic to sulfidic environments within the basin, making the Saanich Inlet a perfect model system for studies focusing on microbially-mediated biogeochemical cycles, as well as microbial ecosystems, without the need to sample from the ocean.
In the age of high-throughput processing of environmental samples, it becomes necessary to classify community members based on a single common molecular marker, such as the 16S rRNA gene of bacteria and archaea. Downstream processing and analysis, however, offers a number of alternatives for reconstructing sample communities with an appropriate degree of representativeness. As such, individual sequences can be categorized into Operational Taxonomic Units (OTUs) or Amplicon Sequence Variants (ASVs), depending on the underlying principle. OTU-clustering involves grouping together of individual reads with a predetermined sequence similarity threshold, usually, around 97-99%, where a single OTU observation may be treated as a single event of species observation (5). Grouping of OTUs can also occur in two ways: in closed-referencing, reads that are sufficiently similar to a reference sequence collection are assigned to the corresponding taxon; while in de novo clustering sequence reads are grouped to a taxon based on the pairwise similarity (5, 6). ASVs, on the other hand, are produced through high-resolution identification down to a single nucleotide, allowing for discrimination between closely related organisms, as even slightly dissimilar sequences are considered to have come from different species rather than resulted from processing errors. Sequence reads are then grouped de novo, independent of an arbitrarily set threshold value (5). Previous studies indicate that both ASVs and OTUs can provide meaningful insight into microbial community composition with respect to ecological roles of individual members, although ASV-based analysis has the potential for greater identification sensitivity and, thus, effectiveness in inference of associated ecological patterns (5, 7, 8). In this project, we will continue to explore the capacity of OTUs and ASVs in environmental sampling analysis by comparing diversity and distribution of selected taxa across the oxygen gradient in the Saanich Inlet using both methods.
This study will focus on diversity and distribution trends of a single family within the Gammaproteobacteria clade called Oceanospirillaceae, which comprises over a dozen genera of aquatic bacteria. Oceanospirillaceae are widely distributed in marine environments and often inhabit regions of high salinity in the water column (7, 8). Cells typically possess rod or helix-shaped morphology, and their motility is supported by polar flagella (7). Majority of genera within the family are either aerobic or microaerophilic, strictly relying on oxygen-mediated respiratory processes for metabolism (8). Oceanospirillaceae are chemooranotrophs, and some members of the family are capable of degrading complex aromatic and branched hydrocarbons, including petroleum, and utilizing them as sole carbon and energy sources. They are also not known to utilize nitrate respiration (8).
While examination of whole classes or phyla can provide robust data on abundance and diversity, we chose to narrow down the scope of our study to a single, relatively specialized family to allow for inference of ecological patterns.
In this project, we aim to analyze and compare distribution and diversity patterns of Oceanospirillaceae across the oxygen gradient of Saanich Inlet water column, using both OTU and ASV-based methods. With this, we also intend to assess the effectiveness of these methods with respect to reconstructing original community from individual sequence reads by comparing the outcomes of analyses.
Water was regularly collected from Saanich inlet at seven depths: 10m, 100m, 120m, 135m, 150m, 165m, and 200m. Samples were analyzed for geochemical data, including oxygen, and used to extract genomic DNA. To perform DNA extractions, biomass was collected by filtering water through a 0.22μm Sterivex filter. The 16s rRNA was then sequenced at the variable 4 through 5 region (V4-V5). To generate the data, the Illumina MiSeq platform offering 2x300 base-pair technology was used. The sequences were then processed in both mothur and QIIME2, using as many similar parameters in the processing as possible to ensure consistent analysis. One taxon was selected for in-depth analysis, which is the family Oceanospirillaceae. It was ensured the selected taxon was present in over three samples and contained over five OTUs.
For analysis, plots were constructed and statistical methods were performed using the programming environment RStudio. The 3.4.3 version of R and the RStudio version 1.1.383 was utilized, as well as the three libraries: tidyverse, phyloseq, and magrittr. Plots were constructed to visualize the change in diversity vs. depth, as well as oxygen concentration, changes in geochemical concentrations across depth, domain and genus abundance across depth, and changes in Oceanospirillaceae with depth and oxygen. For statistical analysis, linear models of abundance vs. depth and oxygen concentration were constructed for each OTU contained in Oceanospirillaceae. Resulting p-values were used to determine the significance of the change in abundance.
An analysis of the sample data using both pipelines revealed nuanced changes in the microbial community as a result of variations in depth and oxygen concentration. Levels of dissolved oxygen decrease with depth, and this has an impact on species diversity. Analysis with the mothur pipeline found that the Shannon Diversity Index (SDI), which is indicative of the species diversity within the sample community, decreases with depth, as shown in Figure 1A and 1B. Analysis with mothur shows that the highest level of species diversity occurred at a depth of 100m, and the lowest level of diversity was found at 150m, with SDI values of 4.27 and 2.35, respectively. The SDI value actually increases slightly after its minimum at 150m to a value of 2.46 at 200m, though this difference is insignificant. Further, the trendline indicates a slightly higher level of species diversity at a depth of 50m than at 100m, although the high levels of uncertainty between 50m to 75m, denoted by the large grey area, render any interpolation for that depth inconclusive, as illustrated in Figure 1A. The largest changes in the SDI occurred between the depths of 100m to 150m, which corresponds to dissolved oxygen levels tapering off from 38 μM at 100m to nearly 0 μM at 150m and beyond.
Analysis with QIIME2 revealed a similar trend in the SDI, albeit with several differences. Unlike the values obtained by mothur, the SDI values given by QIIME2 decrease sequentially, from a maximum SDI of 5.04 to a minimum SDI of 2.97, as illustrated in Figure 1B. The key difference is at depth 200m in which the lowest SDI occurs, though with mothur analysis the lowest SDI is observed at depth 165m. The change in oxygen concentrations between the depths of 100m to 150m and the corresponding change in the SDI at these depths implies that oxygen levels have an impact on the microbial community.
Across all seven depths, the bacteria domain has predominant abundance, followed by archaea, as shown in Figure 2. The mothur-processed data displayed three genera of family Oceanospirillacaea across the sampled depths: Balneatrix, Oleispira, and Pseudohongiella. As for the QIIME2 processed data, two different genera, Marinobacterium and Pseudospirillum were observed in addition to Balneatrix and Oleispira. According to Figure 4A and 4B, family Oceanospirillaceae was most abundant at a depth of 10m and least abundant at a depth of 200m. The general trend presented for Oceanospirillaceae is a gradual decline in abundance with increasing water depth. One exception is at a depth of 165m, where the abundance peaked above its neighbouring depths of 150m and 200m. With regards to the specific genus abundance, the highest was observed for Pseudohongiella, followed by Balneatrix, in the mothur processed data, as shown in Figure 3A. Conversely, in the QIIME processed data the highest genus abundance was observed for Pseudospirillum, followed by Balneatrix, as shown Figure 3B.
knitr::opts_chunk$set(echo = TRUE, fig.width=10, fig.height=10)
library(repr)
## Warning: package 'repr' was built under R version 3.4.4
library(knitr)
library(tidyverse)
library(cowplot)
## Warning: package 'cowplot' was built under R version 3.4.4
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
library(phyloseq)
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
library(dplyr)
library(ggsn)
## Warning: package 'ggsn' was built under R version 3.4.4
library(sf)
## Warning: package 'sf' was built under R version 3.4.4
## Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
1A.
## rowname Chao1 se.chao1 Shannon Depth_m O2_uM PO4_uM SiO2_uM
## 1 Saanich_010 1176.7102 23.37910 3.945966 10 216.667 0.520 20.672
## 2 Saanich_100 1705.9110 33.23574 4.273747 100 38.012 3.672 69.062
## 3 Saanich_120 1509.2529 30.11850 3.937085 120 32.354 4.090 64.383
## 4 Saanich_135 942.8395 15.31532 3.203605 135 20.446 4.342 70.321
## 5 Saanich_150 790.8279 20.59633 2.349901 150 0.000 4.988 70.780
## 6 Saanich_165 771.8898 22.93917 2.350053 165 0.000 5.599 62.580
## 7 Saanich_200 812.4231 20.40070 2.465456 200 0.000 6.298 66.200
## NO3_uM NH4_uM Std_NH4_uM NO2_uM Std_NO2_uM H2S_uM Std_H2S_uM Cells.ml
## 1 1.793 0.4080 0.0084 0.1275 0.0088 0.0000 0.0000 NAN
## 2 26.400 0.1344 0.0092 0.0817 0.0105 0.0000 0.0000 NAN
## 3 21.302 0.1782 0.0100 0.0978 0.0018 0.0000 0.0000 NAN
## 4 15.917 0.1296 0.0166 0.0706 0.0018 0.0000 0.0000 NAN
## 5 5.278 2.1754 0.0293 0.1127 0.0018 0.0000 0.0000 NAN
## 6 0.000 4.7095 0.2112 0.0805 0.0053 3.5027 0.0423 NAN
## 7 0.000 7.3582 0.2816 0.0000 0.0000 17.9867 0.0006 NAN
## N2O_nM Std_N2O_nM CH4_nM Std_CH4_nM Temperature_C Conductivity_mScm_1
## 1 0.849 0.114 1030.478 3.070 12.854 33.534
## 2 18.087 1.275 3.231 0.392 8.703 32.970
## 3 16.304 1.085 3.463 0.519 8.796 33.188
## 4 12.909 2.577 4.815 0.658 8.882 33.345
## 5 11.815 0.000 8.323 0.000 9.002 33.526
## 6 6.310 0.732 23.831 2.291 9.041 33.597
## 7 0.000 0.000 774.034 12.745 9.117 33.727
## Fluorescence_mgm_3 OxygenSBE_V Salinity_PSU Density_q
## 1 3.521 4.954 28.121 21.098
## 2 0.109 0.872 30.872 23.933
## 3 0.197 0.742 31.007 24.026
## 4 0.108 0.469 31.088 24.076
## 5 0.181 0.089 31.164 24.118
## 6 0.132 0.069 31.197 24.138
## 7 0.236 0.063 31.248 24.167
1B.
## rowname Chao1 se.chao1 Shannon Depth_m O2_uM PO4_uM SiO2_uM
## 1 Saanich_010 566.0000 0.0000000 5.046089 10 216.667 0.520 20.672
## 2 Saanich_100 764.7500 1.4237029 5.140263 100 38.012 3.672 69.062
## 3 Saanich_120 798.5000 3.4452856 4.811825 120 32.354 4.090 64.383
## 4 Saanich_135 645.2500 1.7327981 4.185610 135 20.446 4.342 70.321
## 5 Saanich_150 572.0000 1.8177848 3.328242 150 0.000 4.988 70.780
## 6 Saanich_165 483.1429 0.4869678 3.164047 165 0.000 5.599 62.580
## 7 Saanich_200 514.2500 1.7327730 2.971639 200 0.000 6.298 66.200
## NO3_uM NH4_uM Std_NH4_uM NO2_uM Std_NO2_uM H2S_uM Std_H2S_uM Cells.ml
## 1 1.793 0.4080 0.0084 0.1275 0.0088 0.0000 0.0000 NAN
## 2 26.400 0.1344 0.0092 0.0817 0.0105 0.0000 0.0000 NAN
## 3 21.302 0.1782 0.0100 0.0978 0.0018 0.0000 0.0000 NAN
## 4 15.917 0.1296 0.0166 0.0706 0.0018 0.0000 0.0000 NAN
## 5 5.278 2.1754 0.0293 0.1127 0.0018 0.0000 0.0000 NAN
## 6 0.000 4.7095 0.2112 0.0805 0.0053 3.5027 0.0423 NAN
## 7 0.000 7.3582 0.2816 0.0000 0.0000 17.9867 0.0006 NAN
## N2O_nM Std_N2O_nM CH4_nM Std_CH4_nM Temperature_C Conductivity_mScm_1
## 1 0.849 0.114 1030.478 3.070 12.854 33.534
## 2 18.087 1.275 3.231 0.392 8.703 32.970
## 3 16.304 1.085 3.463 0.519 8.796 33.188
## 4 12.909 2.577 4.815 0.658 8.882 33.345
## 5 11.815 0.000 8.323 0.000 9.002 33.526
## 6 6.310 0.732 23.831 2.291 9.041 33.597
## 7 0.000 0.000 774.034 12.745 9.117 33.727
## Fluorescence_mgm_3 OxygenSBE_V Salinity_PSU Density_q
## 1 3.521 4.954 28.121 21.098
## 2 0.109 0.872 30.872 23.933
## 3 0.197 0.742 31.007 24.026
## 4 0.108 0.469 31.088 24.076
## 5 0.181 0.089 31.164 24.118
## 6 0.132 0.069 31.197 24.138
## 7 0.236 0.063 31.248 24.167
1G.
Figure 1. Chao1, Shannon Diversity Index, and five nutrient concentrations (uM) across seven depths analyzed using mothur (A) and QIIME2 (B). Alpha diversity plots across seven depths (C) and oxygen concentration (D) using mothur. Alpha diversity plots across seven depths (E) and oxygen concentration (F) generated using QIIME2. (G) All measured nutrient concentrations (uM) across seven depths.
Figure 2. (A) The analysis of the abundance of domain taxa across depths using mothur (B) The analysis of the abundance of domain taxa across depths using QIIME2.
Figure 3. (A) The analysis of the abundance genus of family Oceanospirillacaea using mothur. (B) The analysis of the abundance of genus of family Oceanospirillacaea using QIIME2.
Regardless of the clustering methods, the genus Balneatrix, which is the only genus of Oceanospirillacaea that is not halophilic (1), tends to inhabit at lower depth. From the primary data, the student observes an increasing trend of salinity, which indicates that the environment becomes more inhospitable for Balneatrix as depth increases.
Table 1. Oxygen concentration, salinity, and water density across Saanich inlet depths
| Depth | OxygenSBE_V | Salinity_PSU | Density_q |
|---|---|---|---|
| Saanich, 10m | 4.954 | 28.121 | 21.098 |
| Saanich, 100m | 0.872 | 30.872 | 23.933 |
| Saanich, 120m | 0.742 | 31.007 | 24.026 |
| Saanich, 135m | 0.469 | 31.088 | 24.076 |
| Saanich, 150m | 0.089 | 31.164 | 24.118 |
| Saanich, 165m | 0.069 | 31.197 | 24.138 |
| Saanich, 200m | 0.063 | 31.248 | 24.167 |
From Figure 3A and 3B, QIIME2 analysis provided one more genus than mothur. There is no previous literature that recommends the optimal targeted variable regions of 16S rRNA in distinguishing the members of Oceanospirillacaea family, so this could be a reason that classifications based on these two methods are not consistent with one another.
QIIME2 derives the biological sequences of the sample even before the introduction of amplification and sequencing errors, which allows this method to distinguish sequences that differ by as little as one nucleotide (9). QIIME2 provides higher resolution in classification of species, which may be the reason that QIIME2 was able to identify genus Marinobacterium, which mothur did not detect at all.
In both the mothur and QIIME2 data analyses, the calculated p-values for abundance vs. depth are 0.001753 and 0.002838, respectively. As for abundance vs. oxygen concentration, the calculated p-values are 0.01401 and 0.02645, respectively. In Figure 4A and 4B, the linear models of the abundance of Oceanospirillacaea across depth display a strong negative correlation within a narrow 95% confidence interval. In Figure 5A and 5B, the linear models of the abundance of Oceanospirillacaea across oxygen concentration display a strong positive correlation within a narrow 95% confidence interval. All of the p-values from the linear model analysis are less than 0.05, therefore Oceanospirillacaea differs significantly in abundance with depth and oxygen concentration.
Figure 4. Linear regression model of the abundance of Oceanospirillaceae across depth using mothur analysis (A) and QIIME2 analysis (B)
Figure 5. Linear regression model of the abundance of Oceanospirillaceae across oxygen concentration using mothur analysis (A) and QIIME2 analysis (B)
Across all samples, there are 25 OTUs within family Oceanospirillaceae according to the mothur analysis as illustrated in Table 2. For the QIIME2 analysis, there are 28 ASVs within family Oceanospirillaceae as shown in Table 2.
Table 2. The richness of Oceanospirillaceae across depths determined using mothur and QIIME2
| Depth | Number of taxa found, mothur | Number of taxa found, QIIME |
|---|---|---|
| Saanich, 10m | 16 | 6 |
| Saanich, 100m | 12 | 16 |
| Saanich, 120m | 7 | 16 |
| Saanich, 135m | 6 | 10 |
| Saanich, 150m | 5 | 10 |
| Saanich, 165m | 6 | 11 |
| Saanich, 200m | 9 | 6 |
| Overall richness | 25 | 28 |
Across the 25 OTUs identified within family Oceanospirillaceae using mothur, the abundance of 12 of the 25 OTUs change significantly with both depth and oxygen concentration. The significant OTUs within family Oceanospirillaceae across depth are OTU0065, OTU0327, OTU0511, OTU0675, OTU0952, OTU0979, OTU1077, OTU1349, OTU1516, OTU1685, OTU1992, and OTU3677. The general trend for the linear model of each of these OTU is a strong negative correlation that falls within a narrow 95% confidence interval as shown in Figure 6A. The p-values for each of the significant OTUs are also calculated to be less than 0.05, with all calculated p-values in Table 3. The same OTUs were significant with respect to oxygen concentration from the analysis of the p-values and the linear models in Figure 6C.
Table 3. p-values from linear models of abundance vs. depth and oxygen using mothur-produced data
| mothur OTU | Depth | Oxygen |
|---|---|---|
| #OTU0065 | 0.01105 | 0.00003483 |
| #OTU0084 | 0.1132 | 0.3495 |
| #OTU0090 | 0.9653 | 0.5526 |
| #OTU0104 | 0.7401 | 0.7596 |
| #OTU0117 | 0.5847 | 0.954 |
| #OTU0327 | 0.01511 | 0.0000997 |
| #OTU0418 | 0.7895 | 0.7514 |
| #OTU0511 | 0.01018 | 0.00003922 |
| #OTU0675 | 0.02419 | 0.0002626 |
| #OTU0857 | 0.2077 | 0.5905 |
| #OTU0952 | 0.0164 | 0.0001264 |
| #OTU0979 | 0.0164 | 0.0001264 |
| #OTU1077 | 0.0164 | 0.0001264 |
| #OTU1349 | 0.0164 | 0.0001264 |
| #OTU1516 | 0.0164 | 0.0001264 |
| #OTU1678 | 0.3893 | 0.5987 |
| #OTU1685 | 0.0164 | 0.0001264 |
| #OTU1730 | 0.4401 | 0.6055 |
| #OTU1992 | 0.0164 | 0.0001264 |
| #OTU2430 | 0.6864 | 0.9432 |
| #OTU3469 | 0.6864 | 0.9432 |
| #OTU3676 | 0.2077 | 0.5905 |
| #OTU3677 | 0.0164 | 0.0164 |
| #OTU3678 | 0.6864 | 0.9432 |
| #OTU3783 | 0.2077 | 0.5905 |
As for the 28 ASVs that were identified using QIIME2, the abundance of 4 of the 28 Oceanospirillaceae ASVs change significantly with both depth and oxygen concentration. The four ASVs that change significantly with depth are ASV131, ASV213, ASV810, and ASV1414. According to Figure 7A, the general trend shown by each of the four ASVs is a strong positive correlation that falls within a narrow 95% confidence interval. Their respective p-values are also calculated to be less than 0.05, with all calculated p-values in Table 4. The ASVs that change significantly with respect to oxygen concentration are ASV131, ASV213, and ASV810. General trends shown are consistent with that of mothur data analysis according to the p-values and the linear models in Figure 7C.
Table 4. p-values from linear models of abundance vs. depth and oxygen using QIIME-produced data
| QIIME ASV | Depth | Oxygen |
|---|---|---|
| #Asv107 | 0.5322 | 0.5905 |
| #Asv120 | 0.8776 | 0.5924 |
| #Asv131 | 0.0164 | 0.0001264 |
| #Asv213 | 0.01002 | 0.00003552 |
| #Asv243 | 0.8 | 0.6734 |
| #Asv417 | 0.9291 | 0.8889 |
| #Asv476 | 0.2077 | 0.5905 |
| #Asv668 | 0.2077 | 0.5905 |
| #Asv810 | 0.007237 | 0.0000101 |
| #Asv1050 | 0.9291 | 0.8889 |
| #Asv1152 | 0.707 | 0.8377 |
| #Asv1283 | 0.6896 | 0.7813 |
| #Asv1352 | 0.7976 | 0.6851 |
| #Asv1414 | 0.01526 | 0.09034 |
| #Asv1430 | 0.2077 | 0.5905 |
| #Asv1486 | 0.6641 | 0.8984 |
| #Asv1490 | 0.8502 | 0.799 |
| #Asv1570 | 0.6988 | 0.7065 |
| #Asv1593 | 0.9291 | 0.8889 |
| #Asv1797 | 0.2077 | 0.5905 |
| #Asv1977 | 0.2077 | 0.5905 |
| #Asv2036 | 0.885 | 0.7762 |
| #Asv2097 | 0.8012 | 0.6649 |
| #Asv2125 | 0.6864 | 0.9432 |
| #Asv2215 | 0.6864 | 0.9432 |
| #Asv2281 | 0.7822 | 0.3544 |
| #Asv2297 | 0.8776 | 0.7041 |
| #Asv2341 | 0.2077 | 0.5905 |
6A.
6B.
6C.
Figure 6. Analysis of the abundance of OTUs within family Oceanospirillaceae across depth and oxygen concentration using mothur. Linear models are represented in (A) and (C) and abundance plot is represented in (B).
7A.
7B.
7C.
Figure 7. Analysis of the abundance of ASVs within family Oceanospirillaceae across depth and oxygen concentration using QIIME. Linear models are represented in (A) and (C) and abundance plot is represented in (B) .
In deducing the significance of the abundance of Oceanospirillaceae across depths and oxygen concentration, both mothur and QIIME2-processed data confirmed that Oceanospirillaceae indeed differs significantly across depths and oxygen concentration. The abundance of the OTUs and ASVs of Oceanospirillaceae were also found to be significant from analysis using each respective pipeline. Although both mothur and QIIME2 provide generally same answers to the assigned questions, differences were also observed across certain analyses. For example, in Figure 1A and 1B, the Chao1 and Shannon Diversity Index values are greater in value in the QIIME2-processed data, compared to the mothur processed data. Another prominent difference lies in the number of significant OTUs and ASVs that were assigned by each pipeline. According to Table 2, QIIME2 outputted 3 more taxa asssignments to Oceanospirillaceae than mothur. Also, in Figure 3A and 3B, there is an increase in genus diversity observed by QIIME2-processed data. Interestingly, the number of significant OTUs amounted to 12 out of 25, compared to only 4 out of 28 significant ASVs.
Using both the mothur and QIIME2 pipelines, we observed an large decrease in the abundance of Oceanospirillaceae genera in relation to depth (Fig 3A, 3B). As the oxygen concentration versus depth graph in Fig 1G indicates, oxygen concentrations decrease linearly with increasing depth to 0μM at 150m, where it remains at 0μM for further depths. This does not reflect the exponential decrease in abundance observed in Fig 3A, 3B, suggesting that Oceanospirillaceae, being obligate aerobes, may be highly sensitive to even minute changes in oxygen concentration. We identified 4 ASVs with QIIME2 that were significantly negatively correlated with depth, Asv213, Asv131, ASV1414 and Asv810 (Fig 7A). Also, approximately half of the OTUs analyzed were found to be significantly negatively correlated with depth (Fig 6A). Importantly, Oceanospirillaceae are known to be limited in their use of available terminal electron acceptors, being able to use only oxygen [6]. Oceanospirillaceae are also not able to fix nitrogen. Therefore, it is unlikely that the observed decrease in abundance is influenced by the increasing/decreasing concentrations of molecules such as NO2-, NO3-, NH4+, H2S, PO43- and others.
There is an obvious limitation with the data in that sample information is missing from depths in between 10m and 100m. This leaves a wide margin of statistical uncertainty when attempting to determine the alpha diversity of Oceanospirillaceae against oxygen concentration or depth, which explains the large grey areas in Fig 1C-1E. Nevertheless, we observed little abundance disparity between the readings at depth 10m and 100m, so we hypothesize that Oceanospirillaceae abundance is fairly consistent within that range, with OTUs and ASVs. Contradictory to this, however, Figure 1G shows a clear drop in oxygen concentration by more than 60% at the 50m mark. Although this decrease in oxygen concentration may not have an effect on overall abundance, there is a distinct shift in genera-specific abundance. At 10m, Pseudospirillum constitute over 90% of the population, while at depths of 100m, 120m and 135m, the abundance is divided equally between Pseudospirillum and Balneatrix. It is likely that the Balneatrix genus is less sensitive to drastic changes in oxygen concentration, although without functional analysis of sequence data, this remains unknown.
Interestingly, despite Oceanospirillaceae being strictly aerobic, both mothur and QIIME2 detected low levels of abundance at anoxic depths: 150m, 165m and 200m. Given that Saanich Inlet is a seasonally anoxic fjord, there are seasonal fluctuations in oxygen levels across all depths, where even at 200m, oxygen concentrations can range from 15μM to less than 3μM [1]. It is possible that although Figure 1G shows negligible oxygen concentrations at depths below 150m, small periodic fluxes of dissolved oxygen allow for the survival of small Oceanospirillaceae populations, although this is unlikely given the time of year. Further, since 16S rRNA-based analyses do not discriminate between the living and the dead organisms in the environmental sample, some of the abundances at greater depths could be attributed to metabolically inactive cells and dead organic matter that sank down from the near-surface waters.
With respect to the comparison between mothur and QIIME2 abundance detection, our data reflects findings in previous literature. QIIME2, with its use of a de-noising algorithm prior to sequence clustering and resulting higher sensitivity to single nucleotide differences produced 4 distinct genera: Balneatrix, Oleispira, Pseudospirillum and Marinobacterium, while Mothur only detected 3 genera: Balneatrix, Oleispira and Pseudohongiella (Fig 3A, 3B). Given the different methods in which each pipeline clusters sequence data, this variation is expected. We also observed greater diversity at the genus level with QIIME2 than with mothur. The greater resolution at this taxonomic level provided by QIIME2 highlights its potential to be used to distinguish between genetically similar bacteria. For this reason, QIIME2 could be used, with a higher degree of sensitivity, to categorize, sort and process information from large data sets, such as primary sequence data from newly sampled environments. The weakness of both pipelines, however, is that the fixed definition of bacterial “species”" is largely undefined and highly subjective. This, combined with the exceedingly high degree of 16S RNA similarity at the genus level between bacteria, makes it difficult to make taxonomic conclusions from the data. Notably, both pipelines showed similar levels of abundance, indicating that either pipeline can be used to reliably determine bacterial abundance in environmental samples.
In discussing the differences between statistically significant OTUs or ASVs versus depth, we acknowledge that unless accurate functional information is obtained from the sequence data, the genus classification of Oceanospirillaceae remains uncertain. In addition to this, little is known about the proteomics of Oceanospirillaceae and their species genotypes have been largely unexplored, making it more difficult to make conclusions based on diversity and richness. However, our results indicate that QIIME2 might be more effective at determining richness, given that the Chao1 richness estimation from QIIME2 reflect the shift in genus abundance at each depth, indicating an increase in ASV richness with the increased abundance of Balneatrix at lower depths (Table 2). Although mothur detects the shift in genera abundance, it fails to detect the likely change in richness as seen in Table 2. We predict that this increase in ASVs at depths between 100m and 165m may be due to subtle nucleotide changes that correspond to homologs of metabolic genes in Balneatrix that enable them to inhabit these slightly less oxygenated depths and to influence enzyme adaptability in different environmental conditions.
It is crucial to note that the data used is severely limited and in need of sampling from depths between 10m and 100m. With a greater amount of varied sampling data, there may be greater confidence in using mothur for detailed characterization of samples at the genus level. Overall, although the analyses from both pipelines was effective, we believe that it is not fully representative of the actual environment and further tests are needed to more accurately characterize Oceanospirillaceae in Saanich Inlet. We conclude that QIIME2, with its greater sensitivity and ability to detect minute differences in specific metabolic genes of interest, makes it an effective tool in predicting abundance, diversity and richness in biogeochemically fluctuating environments, particularly when working with limited sequence data [10]. Mothur, with its flexibility and precision, due to its use of reference sequences during alignment and cleaning would allow for more robust sample characterization, although with less coverage [10]. Despite the variation between the pipelines in terms of diversity, genus abundance and richness, analysis of Saanich Inlet sample data using both mothur and QIIME2 lead us to conclude that there is an inverse relationship between overall abundance of Oceanospirillaceae and dissolved oxygen concentration.
Torres-Beltrán M, Hawley A, Capelle D, Zaikova E, Walsh D, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies E, Fairley D, Michiels C, Suttle C, Whitney F, Crowe S, Tortell P, Hallam S. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4:170159.
Belley R, Snelgrove P, Archambault P, Juniper S. 2016. Environmental Drivers of Benthic Flux Variation and Ecosystem Functioning in Salish Sea and Northeast Pacific Sediments. PLoS One 11(3):e0151110.
Grundle DS. 2007. MSc thesis. University of Victoria, Victoria, BC.
Manning C, Hamme R, Bourbonnais A. 2010. Impact of deep-water renewal events on fixed nitrogen loss from seasonally-anoxic Saanich Inlet. Mar Chem 122(1-4):1-10.
Callahal BJ, McMurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J 11:2639-2643.
He Y, Caporaso J, Jiang X, Sheng H, Huse S, Rideout J, Edgar R, Kopylova E, Walters W, Knight R, Zhou H. 2015. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity. Microbiome 3(1):20.
Garrity G, Bell J, Lilburn T. 2007. FAMILY I. OCEANOSPIRILLACEAE. In Brenner, D, Krieg, N, Stanley, J (ed.), Bergey’s Manual® of Systematic Bacteriology: Volume 2: The Proteobacteria, Part B: The Gammaproteobacteria. Springer Science & Business Media, New York.
Satomi M, Fujii T. 2014. The Family Oceanospirillaceae. In Rosenberg, E, DeLong, E, Lory, S, Stackebrandt, E, Thompson, F (ed.), The Prokaryotes, 4th ed. Springer, Berlin.
Callahan, B, McMurdie, P, Holmes, S P. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J 11:2639-2643.
Plummer, E., & Twin, J. (2015). A Comparison of Three Bioinformatics Pipelines for the Analysis of Preterm Gut Microbiota using 16S rRNA Gene Sequencing Data. Journal of Proteomics & Bioinformatics, 8(12). doi:10.4172/jpb.1000381
Callahan, B. J., Mcmurdie, P. J., & Holmes, S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker gene data analysis. doi:10.1101/113597
Gaudet, A. D., Ramer, L. M., Nakonechny, J., Cragg, J. J., & Ramer, M. S. (2010). Small-Group Learning in an Upper-Level University Biology Class Enhances Academic Performance and Student Attitudes Toward Group Work. PLoS ONE, 5(12). doi:10.1371/journal.pone.0015821
Hallam, S. J., Torres-Beltrán, M., & Hawley, A. K. (2017). Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Scientific Data, 4, 170158. doi:10.1038/sdata.2017.158
Hawley, A. K., Torres-Beltrán, M., Zaikova, E., Walsh, D. A., Mueller, A., Scofield, M., . . . Hallam, S. J. (2017). A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data, 4, 170160. doi:10.1038/sdata.2017.160
Kunin, V., Engelbrektson, A., Ochman, H., & Hugenholtz, P. (2010). Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology, 12(1), 118-123. doi:10.1111/j.1462-2920.2009.02051.x
Sogin, M. L., Morrison, H. G., Huber, J. A., Welch D. M., Huse, S. M., Neal, P. R., Arrieta, J. M., Herndl, G. J. (2006). Microbial diversity in the deep sea and the underexplored “rare biosphere” PNAS,103(32), 12115-12120.
Torres-Beltrán, M., Hawley, A. K., Capelle, D., Zaikova, E., Walsh, D. A., Mueller, A., . . . Hallam, S. J. (2017). A compendium of geochemical information from the Saanich Inlet water column. Scientific Data, 4, 170159. doi:10.1038/sdata.2017.159
Welch, R. A., Burland, V., Plunkett III, G., Redford, P., Roesch, P., Rasko, D., Buckles, E. L. , . . . Blattner F. R. (2002). Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl Acad. Sci. USA 99, 17020-17024